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Rijk Zwaan is active worldwide as a vegetable breeding eompany that 
focuses on the development of high-quality vegetable varieties for 
professional growers in food-producing hortieulture, be that in 

glasshouses, tunnels or outdoors. 



Middle position in world top 10 globally-operating vegetable breeding oompanies 

Around1900 employees all over the world 

1.000 varieties in approximately 30 different vegetable orops 

Sell seeds in more than 100 different oountries all over the world 



. Gene predietion 

. SNP ealling: RNA-seq, DNA 

. Gene expression analysis 

. Transeription binding site Prediotion (miRNA) 

. Protein tunetional analysis (domain prediotion, struotural analysis) 
. Orthologue gene prediotion 

.Server nnaintenanee 
.G browser 
.SNP database 

.Speeies: 

Tomato 

Gueumber 

Melon / water melon 

Pepper 

Lettuoe ... 



SNP ldentification for water 




Single-nueleotide polymorphism (SNP, pronouneed snip) is 
a DNAsequence variation oeeurring when a single nudeotide 
— A, T, C or G — in the genome (or other shared sequence) 
differs between members of a biologieal speeies or paired 
ohromosomes in an individual. 




Resourees: 



RZ 900 RZ 901 (parents lines with different Pinenotypes) 
lllunnina hiseq 2000 paired end reads 
Reference assembly version: Watermelon V70 

Purposes 

*Used as genetio markers for genetie mapping. 

*Used as genetie markers assoeiate with eertain genes or phenotypes 

The interesting SNPs 

1 . Homozygous SNPs whioh are different between 2 parents lines 

2. Evenly distributed on the Ghromosome 

3. Can be uniguely amplified by PCR, then proper for SNP validation assey 



SNP ldentification worl<fl0w f<3nA/aftermelon 
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BWA + pieard + GATK 
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SNP validation 




SNP ealling by GA'iK 
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ealling Varlants with the GATK 



PHASE 1: 

NGS DATA 
PROGESSING 



Typieally by Lane 



BWA 



pieard 



Raw Reads 




Loeal 
Realignment 



Base Ouallty 
Reeaiibratlon 



Analysls-ready 
Reads 



PHASE 2: 

YARIANT DISeOYERY 
AND GENOTYPING 



Typieally Multiple Samples 
Simultaneously 
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SNPs Indels 



Struetural 
Variations 
(SVs) 



Raw Variants 



PHASE 3: 

INTEGRATIYE 
ANALYSIS 



Raw Variants 



SNPs 



Indels 



Struetural 
Variations 
(SVs) 



External Data 
[ Pedigr 



srees 



Population 
Strueture 



Known 
Variation 



Known 
Genotypes 
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Variant Quality Reealibration 



Genotype Re 



Analysis-ready Variants 



SNP ealling result 




VCF format 



scaffold9 721024 . T C 752.98 . 

AC=2;AF=0.50;AN=4;BaseQRankSum=1 .087;DP=34;Dels=0.00;FS=25.1 18;HRun=1 ;HaplotypeScore=1 .4816;MQ=59.32;MQ0=0;MQRankSum= 
0.845;QD=41 .83;ReadPosRankSum=-0.742;SB=-354.80 GT:AD:DP:GQ:PL 



SNPs seleetion eriteria 



1. DP> 10 



GT:AD:^. :GQ:PL 1/1:0,18:18:54.18:794,54,0 0/0:16,0: . :48.16:0,48,716 



2. AD ratio <=0.2 GT: :DP:GQ:PL 1/1: :1 8:54.1 8:794,54,0 0/0: :16:48.16:0,48,716 



AD ratio samplel: 
AD ratio samplel: 




alternative homozygous 
reference homozygous 



1122778 homozygous SNPs found differing 
between RZ900 and RZ901 




SNP seleetion 




Step1. generating maker sequence without flanking SNP 



Length: 101bp, SNP in middle with 50bp Aanking marker 



Pormat: 

>scaffold1_10477 

TAGTACATTTCTATTATTCAACTGTGAGTTATTTTCGAAGTTTTATTAAT[T/G]TTCGTTTTTTATTTATAACTTTCAATTAATTAGAAAAATAGTAAAAACT 



Excluding the markers with flanking SNPs 

>scaffold1_13888 

AAATATTTTTAAATATAAGAAAGTGTCATTGTTTATCAATAATAGACACT[G/A]ATGGATAAAnATTTTGTTATGTTTGTAACTATTTTGGTTTATTGCTGT 





seleetion 



Step2. Marker seleetion 



Doing the Blast of marl^ers against v1 genome 



Excluding the repetitive marl<er 




Witli 100% identieal hits in v1 genome database 



Last step seleetion 



Reads preproeess & 
alignment 



BWA 
Samtools 



SNP ealling & genotyping 
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SNP ealling questj€^T::1 




How to define the DP cutoff? 



Mininnunn? Maximum? 



RZ900. median 14, average 20.68 



RZ901 median=l5, avemge=20.66 
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SNP ealling questj€5p^ 




What is the best measurement for the SNP guality? 

QUAL: The Phred sealed probability of Probablllty that REF/ALT polymorphism exlsts at this site 
glven sequenclng data. 

DP: The DP field deseribe the total depth of reads that passed the Unified Genotypers internal guality 
Gontrol metrios 

GQ: The Genotype Ouality, or Phred-soaled Gonfidence that the true genotype is the one provided in GT 
MQ: Root Mean Sguare of the mapping guality of the reads aeross all samples 



QUAL vs MQ 

eor QUAL vs MQ , cor=0. 16, p<2.2e-i6 
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QUAL vs DP RZ900 

QUAL VS DP900 cor=0.63, p<2.2e-l6 



50 - 



40 - 




2500 



QUAL vs GQ RZ900 



QUAL vs GQ 900 cor=0.6, p<2.2e -16 
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Conflicting between GT and AD? 

SNP flanl<ing eonsensus seguenee quality? 

Using non-redundant reads? 



Other Markers (Genetie variation) ? 



• ldentification of : 

• InDels 

• Struetural variants 

• Copy number variants 

• Transposable elements 

And the predietion of their tunetional consequences. 
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